Incremental
load is defined as the activity of loading only new or updated records
from the database into target table. Incremental loads are useful
because they run very efficiently when compared to full loads,
particularly so for large data sets. Sqoop
provides an incremental import mode which can be used to retrieve
only rows newer than some previously-imported set of rows.
The following arguments control incremental imports:
The following arguments control incremental imports:
Sqoop
supports two types of incremental imports:
- append
- lastmodified.
You can use the --incremental argument to specify the type of incremental import to perform. You
should specify append
mode when
importing a table where new rows are continually being added with
increasing row id values. You specify the column containing the row’s
id with --check-column. Sqoop imports
rows where the check column has a value greater than the one
specified with --last-value.
An
alternate table update strategy supported by Sqoop is called lastmodified called
mode. You should use this
when rows of the source table may be updated, and each such update
will set the value of a last-modified column to the current
timestamp. Rows where the check column holds a timestamp more recent
than the timestamp specified with --last-value
are imported.
At the
end of an incremental import, the value which should be specified as --last-value
for a subsequent import is
printed to the screen. When running a subsequent import, you should
specify --last-value in this way to
ensure you import only the new or updated data. This is handled
automatically by creating an incremental import as a saved job, which
is the preferred mechanism for performing a recurring incremental
import. See the section on saved jobs later in this document for more
information.The
process to perform incremental data load in Sqoop is to synchronize
the modified or updated data (often referred as delta data) from
RDBMS to Hadoop. The
delta data can be facilitated through the incremental load command in
Sqoop. Incremental
load can be performed by using Sqoop import command or by loading the
data into hive without overwriting it. The different attributes that
need to be specified during incremental load in Sqoop are-- Mode (incremental) –The mode defines how Sqoop will determine what the new rows are. The mode can have value as Append or Last Modified.
- Col (Check-column) –This attribute specifies the column that should be examined to find out the rows to be imported.
- Value (last-value) –This denotes the maximum value of the check column from the previous import operation.
Login Cloudera Mysql DataBase
mysql -uroot -pcloudera
mysql -uroot -pcloudera
create
table emp_incr(empid int primary key,ename varchar(50),esal int);
insert
into emp_incr values(100,'EMP1',12000);
insert into emp_incr values(101,'EMP2',13000);
insert into emp_incr values(102,'EMP3',14000);
insert into emp_incr values(103,'EMP4',15000);
insert into emp_incr values(105,'EMP5',16000);
insert into emp_incr values(101,'EMP2',13000);
insert into emp_incr values(102,'EMP3',14000);
insert into emp_incr values(103,'EMP4',15000);
insert into emp_incr values(105,'EMP5',16000);
New
Records:-
insert
into emp_incr values(106,'EMP6',14000);
insert
into emp_incr values(107,'EMP7',15000);
insert
into emp_incr values(108,'EMP8',16000);
Incremental Append
sqoop
import
--connect jdbc:mysql://localhost:3306/dwdev --username root
--password cloudera --table emp_incr --target-dir=/sqoop/Import_incr
-m 1;
Incremental Last Modified
sqoop import --connect jdbc:mysql://localhost:3306/dwdev --username root --password cloudera --table emp --target-dir=/sqoop/Import -m 1;
sqoop import --connect jdbc:mysql://localhost:3306/dwdev --username root --password cloudera --table emp_incr --incremental lastmodified --check-column esal --last-value 15000 --target-dir=/sqoop/Import_incr -m 1;
No comments:
Post a Comment